Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci
نویسندگان
چکیده
Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates.
منابع مشابه
Dating divergence of Polystigma and other Sordariomycetes
Studies on the evolutionary history of ascomycetes in terms of time scale will help to understand historical patterns that shape their biodiversity. Until now most of dating studies of ascomycetes have focused on major events in fungal evolution but not on divergence events within smaller groups of fungi e.g. within Sordariomycetes. We used molecular dating to estimate the time of separation of...
متن کاملAn Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale data sets with many gene loci. In order to improve both model realism and to best extract information about relative divergence times in the sequence data, it is important to account for the heterogeneity in the evolutionary process across genes...
متن کاملComparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species.
Divergence time and substitution rate are seriously confounded in phylogenetic analysis, making it difficult to estimate divergence times when the molecular clock (rate constancy among lineages) is violated. This problem can be alleviated to some extent by analyzing multiple gene loci simultaneously and by using multiple calibration points. While different genes may have different patterns of e...
متن کاملBayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds.
We implement a Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes. A birth-death process with species sampling is used to specify a prior for divergence times, which allows easy assessment of the effects of that prior on posterior time estimates. We propose a ...
متن کاملRelaxed molecular clocks, the bias-variance trade-off, and the quality of phylogenetic inference.
Because a constant rate of DNA sequence evolution cannot be assumed to be ubiquitous, relaxed molecular clock inference models have proven useful when estimating rates and divergence dates. Furthermore, it has been recently suggested that using relaxed molecular clocks may provide superior accuracy and precision in phylogenetic inference compared with traditional time-free methods that do not i...
متن کامل